Information Dropout: learning optimal representations through noise
نویسندگان
چکیده
We introduce Information Dropout, a generalization of dropout that is motivated by the Information Bottleneck principle and highlights the way in which injecting noise in the activations can help in learning optimal representations of the data. Information Dropout is rooted in information theoretic principles, it includes as special cases several existing dropout methods, like Gaussian Dropout and Variational Dropout, and, unlike classical dropout, it can learn and build representations that are invariant to nuisances of the data, like occlusions and clutter. When the task is the reconstruction of the input, we show that the information dropout method yields a variational autoencoder as a special case, thus providing a link between representation learning, information theory and variational inference. Our experiments validate the theoretical intuitions behind our method, and we find that information dropout achieves a comparable or better generalization performance than binary dropout, especially on smaller models, since it can automatically adapt the noise to the structure of the network, as well as to the test sample.
منابع مشابه
Information Dropout: Learning Optimal Representations Through Noisy Computation
The cross-entropy loss commonly used in deep learning is closely related to the defining properties of optimal representations, but does not enforce some of the key properties. We show that this can be solved by adding a regularization term, which is in turn related to injecting multiplicative noise in the activations of a Deep Neural Network, a special case of which is the common practice of d...
متن کاملAnalyzing noise in autoencoders and deep networks
Autoencoders have emerged as a useful framework for unsupervised learning of internal representations, and a wide variety of apparently conceptually disparate regularization techniques have been proposed to generate useful features. Here we extend existing denoising autoencoders to additionally inject noise before the nonlinearity, and at the hidden unit activations. We show that a wide variety...
متن کاملLow Dropout Based Noise Minimization of Active Mode Power Gated Circuit
Power gating technique reduces leakage power in the circuit. However, power gating leads to large voltage fluctuation on the power rail during power gating mode to active mode due to the package inductance in the Printed Circuit Board. This voltage fluctuation may cause unwanted transitions in neighboring circuits. In this work, a power gating architecture is developed for minimizing power in a...
متن کاملThe Role of Information Complexity and Randomization in Representation Learning
A grand challenge in representation learning is to learn the different explanatory factors of variation behind the high dimensional data. Encoder models are often determined to optimize performance on training data when the real objective is to generalize well to unseen data. Although there is enough numerical evidence suggesting that noise injection (during training) at the representation leve...
متن کاملFollow the Leader with Dropout Perturbations
We consider online prediction with expert advice. Over the course of many trials, the goal of the learning algorithm is to achieve small additional loss (i.e. regret) compared to the loss of the best from a set of K experts. The two most popular algorithms are Hedge/Weighted Majority and Follow the Perturbed Leader (FPL). The latter algorithm first perturbs the loss of each expert by independen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1611.01353 شماره
صفحات -
تاریخ انتشار 2016